Skip to content

Moves glob/wildcard matching into Fact.#323

Open
Stringy wants to merge 8 commits intomainfrom
giles/wildcard-support
Open

Moves glob/wildcard matching into Fact.#323
Stringy wants to merge 8 commits intomainfrom
giles/wildcard-support

Conversation

@Stringy
Copy link
Contributor

@Stringy Stringy commented Feb 23, 2026

Description

Host scanning now uses globs to only get inodes for the specific files matching the globs.

Prefix map is populated with the longest prefix for each glob e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_

Kernel captures events based on inode first and then prefix match (this behavior is unchanged) and then userspace does a glob match on the path and host_path.

Somewhat relies on a chain of PRs in the main repo (in merge order):

stackrox/stackrox#19057
stackrox/stackrox#19063
stackrox/stackrox#19089

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

TODO(replace-me)
Use this space to explain how you tested your PR, or, if you didn't test it, why you did not do so. (Valid reasons include "CI is sufficient" or "No testable changes")
In addition to reviewing your code, reviewers must also review your testing instructions, and make sure they are sufficient.

For more details, ref the Confluence page about this section.

Host scanning now uses globs to only get inodes for the specific files
matching the globs.

Prefix map is populated with the longest prefix for each glob
e.g. /etc/**/*.conf -> /etc/
     /home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_

Kernel captures events based on inode first and then prefix match (this
behavior is unchanged) and then userspace does a glob match on the path
and host_path.
Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for tackling this!!

let mut new_paths = Vec::with_capacity(paths_config.len());
let mut builder = GlobSetBuilder::new();
for p in paths_config.iter() {
builder.add(Glob::new(&p.to_string_lossy())?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to hard fail if a configured path is wrong, if we change the string at this point we might not match the strings configured by a user and we will not report things in there.

@Stringy Stringy requested a review from Molter73 February 23, 2026 15:27
Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, can we add at least one integration test with globs? Just to make sure it is working, we can always expand later.

@Stringy Stringy requested a review from Molter73 February 24, 2026 14:21
@Stringy Stringy marked this pull request as ready for review February 24, 2026 14:21
@Stringy Stringy requested review from a team and rhacs-bot as code owners February 24, 2026 14:21
@Molter73
Copy link
Contributor

/retest

Comment on lines 47 to +63
inode_key_t inode_key = inode_to_key(file->f_inode);
const inode_value_t* inode = inode_get(&inode_key);
inode_key_t* inode_to_submit = &inode_key;
switch (inode_is_monitored(inode)) {
case NOT_MONITORED:
if (!is_monitored(path)) {
goto ignored;
}
// Matched by path prefix only, not by inode.
// Set inode to NULL so userspace knows to do glob matching.
inode_to_submit = NULL;
break;
case MONITORED:
break;
}

submit_event(&m->file_open, event_type, path->path, &inode_key, true);
submit_event(&m->file_open, event_type, path->path, inode_to_submit, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to use a map because the inode_key_t type is small, but we might want to have a per cpu array map and use a pointer to that to simplify the code even further. It would also reduce the amount of stack used for every program, which might be helpful in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, I don't want to block the PR on this, we can do it in a follow up.

let mut builder = GlobSetBuilder::new();
for p in paths_config.iter() {
builder.add(
Glob::new(&p.to_string_lossy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had mentioned this, we probably don't want to use to_string_lossy() here, this will make it so the path will not actually match if an invalid UTF-8 character is used and will silently drop events that we should emit. This will be very hard to debug, special for a user that might not be aware of this, instead we probably want to hard fail, falling back to the previous configuration or stopping altogether.

cwd = os.getcwd()
config = {
'paths': [monitored_dir, '/mounted', '/container-dir'],
'paths': [f'{monitored_dir}/**/*', '/mounted/**/*', '/container-dir/**/*'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit annoying that we now have to manually specify /**/* at the end of every directory being configured. I might look into changing this behavior in the future, maybe we can add this automatically in fact if the configured path has no glob expressions in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious why this matches files directly under /mounted for instance, when the glob expressions explicitly says there should be 2 / after the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this matches files directly under /mounted

** will match zero or more path segments, so matching files in /mounted is expected behaviour

config, config_file = fact_config
config['paths'] = [
f'{monitored_dir}/**/*.txt',
f'{monitored_dir}/**/test-*.log',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a case that is something like: f'{monitored_dir}/*.cfg

This should validate we don't recursively check directories when ** is not used.


```shell
cargo test --config 'target."cfg(all())".runner="sudo -E" --features=bpf-test
cargo test --config 'target."cfg(all())".runner="sudo -E"' --features=bpf-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this one!

Comment on lines 24 to 31
txt_file = os.path.join(monitored_dir, 'document.txt')
with open(txt_file, 'w') as f:
f.write('This should be captured')

# Should not match any pattern
log_file = os.path.join(monitored_dir, 'app.log')
with open(log_file, 'w') as f:
f.write('This should be ignored')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want these the other way around, reason being the server.wait_events method does not wait for events after it caught the last one expected, so you want to trigger anything that should be ignored before things that should be caught.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I know this should be improved, probably by checking there are no events leftover in the server buffer during teardown or something.

return config, config_file


def test_extension_wildcard(fact, wildcard_config, monitored_dir, server):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact fixture is configured as autouse, so you shouldn't need to explicitly ask for it in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in this file apply to all tests in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants